The genome of a globally invasive passerine, the common myna, Acridotheres tristis

Abstract In an era of global climate change, biodiversity conservation is receiving increased attention. Conservation efforts are greatly aided by genetic tools and approaches, which seek to understand patterns of genetic diversity and how they impact species health and their ability to persist under future climate regimes. Invasive species offer vital model systems in which to investigate questions regarding adaptive potential, with a particular focus on how changes in genetic diversity and effective population size interact with novel selection regimes. The common myna (Acridotheres tristis) is a globally invasive passerine and is an excellent model species for research both into the persistence of low-diversity populations and the mechanisms of biological invasion. To underpin research on the invasion genetics of this species, we present the genome assembly of the common myna. We describe the genomic landscape of this species, including genome wide allelic diversity, methylation, repeats, and recombination rate, as well as an examination of gene family evolution. Finally, we use demographic analysis to identify that some native regions underwent a dramatic population increase between the two most recent periods of glaciation, and reveal artefactual impacts of genetic bottlenecks on demographic analysis.


Appendix 2: Manual genome curation
The initial nanopore assembled and polished genome assembly was mapped, using MINIMAP2 v2.24 1 , to the zebra finch (Taeniopygia guttata) genome assembly (GCF_003957565.2),as well as a Vertebrate Genomes Project (VGP) common myna genome assembly (GCA_027559615.1).These alignments were visualised as dot plots using DGENIES v1.4.0 2 , and the dot plots manually inspected for signatures of misassemblies (Fig. S1).To investigate these putative misassemblies, we used MINIMAP2 to map reads (raw ONT, and trimmed 10x reads) to the assembly so coverage around potential misassembly points could be investigated.In addition to the raw reads, we aligned the assemblies of the European starling (Sturnus vulgaris) genome (GCA_023376015.1) 3to provide extra context around potential misassembly points.These three tracks were visualised in IGV v2.15.4 4 and locations of misassembly signatures were examined.Contigs that contained evidence for potential misassembly (for examples, see: Fig. S2) were manually broken using BEDTOOLS V2.30 getfasta 5 .We used DIPLOIDOCUS v1.1.1 6 to check for assembly artefacts (runmode=purgehaplotig) and vector contamination (runmode=vecscreen using the NIH UniVec database).NUMTFINDER v0.5.1 7 was used to identify the mitochondrial genome and search for misassemblies of it, using the common myna mitochondrial genome as a reference (CM050619.1).Contigs below length of 1,000bp were identified using BBMAP v38.81 8 and removed.BLOBTOOLS v1.0 9 was used to assess if any of the assembled contigs may have belonged to contaminants in the raw sequencing data, using BLASTN v2.13.0 10 (-task megablast) to assemble the database, and the ONT reads mapped to the genome using MINIMAP2.Sequences that were identified as contaminants were removed from the assembly (Fig. S3).

Appendix 3: Z chromosome genomic rearrangement
Prior to use as a reference for synteny scaffolding, the VGP A. tristis genome was examined for synteny against the T. guttata genome using MINIMAP2 and visualised in DGENIES (Supplementary materials: Fig. S5).The dot plot revealed that approximately 17 Mb section of a T. guttata autosomal chromosome was assembled onto the end of the Z chromosome of VGP T. guttata.To investigate whether this was a genuine genomic rearrangement in A. tristis relative to the zebra finch or a misassembly, we mapped our raw ONT reads to VGP T. guttata using MINIMAP2 and visualised in IGV (Supplementary materials: Fig. S5).High ONT read concatenation around the scaffold join site did not support the linking of these two sequences in our assembly.We took a conservative approach and excluded the equivalent region in our assembly from synteny based scaffolding, but during scaffold renaming renamed it based on T. guttata synteny.

Figure S2 |
Figure S2 | Examples of manual assembly curation.Alignments produced using MINIMAP2 v2.24 and processed in SAMTOOLS v1.13, and finally visualised in IGV v2.15.4.Window numbers correspond to the exemplar windows from FigureS1that were investigated during manual genome curation that were deemed to be 1) a misassembly, 2) not a misassembly, and 3) a misassembly.

Figure S3 |
Figure S3| Blobtools summary plot of genome assembly contamination produced using blast (-task megablast) to assign taxonomic identities to the sequences.

Figure
Figure S7 | pseudo-chromosome sizes with macro and micro chromosome numbers, based on zebra finch synteny, labelled.

Figure S8 |
Figure S8 | Correlation plots of the methylation calls across the three flow cells obtained by linear regression in R.

Figure S9 |
Figure S9 | Histogram of Acridotheres tristis linked read gDNA k-mer counts.Panel (a) was estimated using JELLYFISH from k=7 to k=200.Vertical red line denotes histogram peak at k=31.Genome Length = 36,028,219,169 / 31 = 1,162,200,618 bp.The final genome assembly size was used to find the using the genome size k-mer estimate.1,040,603,622 / 1,162,200,618 = 89.54%complete.Panel (b) was estimated by MERQURY and reported a genome completeness score of 91.17% and a consensus quality value (QV) of 44.1.

Figure S10 |
Figure S10 | Repeat content of the Acridotheres tristis AcTris_vAus2.0genome (left) and the VGP A. tristis genome (right).Panels (a) and (b) are profiles of genome transposable element content as generated by EARLYGREY.Panel (c) is a REPEATMASKER repeat annotation of the two genome assemblies.

Figure S12 |
Figure S12 | REVIGO summary of the biological process captured in gene ontology terms associatedwith orthogroups determined to be to be under panel (a) significant expansion and panel (b) significant contractions in A. tristis.Colour represents across how many separate significantly expanded or contracted orthogroups the GO term was independently annotated with using INTERPROSCAN, while log size represents the weighting of that GO term in the dataset once redundant GO terms were collapsed during REVIGO analysis and visualisation.Therefore, both colour and circle size capture the level of representation of the term but do so at different steps of the analysis.

Figure S13 |
Figure S13 | PSMC bootstrapping plots with colour scheme matching Fig. 6 from the main manuscript indicated next to each plot.

Table S2 | ONT read batches and corresponding program version numbers
used for raw data processing used at ONT assembly steps, assessed using NanoStats.

Table S3 |
Ensembl reference species used in the GEMOMA genome annotation.

Table S4 | Genome size and completion statistics produced by JELLYFISH
for a range of initial k-mer values.Completion statistic based on the final AcTris_vAus2.0assembly size of 1,040,603,622 bp.